home *** CD-ROM | disk | FTP | other *** search
- ------------------------------------
- TEXTO - Text steganography
- ------------------------------------
-
- Texto is a rudimentary text steganography program which transforms
- uuencoded or pgp ascii-armoured ascii data into English sentences.
-
- This program was written to facilitate the exchange of binary
- data, especially encrypted data. Why is this necessary? People or
- programs may be reading your mail. Recent events in the US congress may
- _require_ Internet Service Providers to monitor incoming mail and determine
- whether or not it is "obscene" or lives up to particular parochial moral
- standards. Since they can't scan the contents of an encrypted message,
- and probably don't have time to manually look at each uuencoded message,
- such emails will probably go into the bit bucket. This program's output
- is hopefully close enough to normal English text that it will slip by
- any kind of automated scanning.
-
- Texto text files look like something between mad libs and bad poetry,
- (although they do sometimes contain deep cosmic truths) and should be close
- enough to normal english to get past simple-minded mail scanners and to
- entertain readers of talk.bizarre.
-
- Texto works just like a simple substitution cipher, each of the 64 ascii
- symbols used by pgp ascii armour or uuencode is replaced by an english word.
- Not all of the words in the resulting English are significant, only those
- nouns, verbs, adjectives, and adverbs used to fill in the preset sentence
- structures. Punctuation and "connecting" words (or any other words not in
- the dictionary) are ignored.
-
- The obvious main drawback to using this program: the resulting text
- is larger than the original data by a factor of 10. This is bad to the
- point of uselessness if you need to send a 5MB uuencoded file. What
- are some possible solutions to this problem? Using shorter words would
- yield only minimal improvement as most of the words are pretty short now,
- and you would still need the same number of english words. The best
- solution I can think of is to use more words, one for every 2 symbols
- instead of a one-to-one symbol to word mapping. This requires 4096 words
- for each part of speech, (finding that many adverbs will be a real challenge),
- but search speed shouldn't become a big factor when transforming text to data,
- since texto uses a hash table for the words and their lengths in order to
- minimize search times. The net result would probably be an average expansion
- by ~5x instead of ~10x, which is significant enough to warrant trying it.
- Changing the code will be easy, the hard part is typing in the dictionaries.
- Look for this feature in texto 2.0 coming Real Soon to a net near you.
-
- Since words are occasionally pluralized and/or gerundized (-ing), and
- they're not all regular verbs/nouns, there are plenty of strange spelling
- mistakes. While normally I despise misspelled words, they add a nice
- human touch to the repetitive text, and add to the feeling that who/whatever
- wrote the text was quite clearly out of his/her/its mind.
-
-
- Usage:
- ------
-
- texto msgfile > engfile - Transforms the contents of msgfile into
- English text and places results in "engfile"
- msgfile must be a uuencoded or pgp ascii-
- armoured text file.
-
- texto -p engfile > pgpfile - Takes English text from engfile and produces
- OR a pgp ascii-armoured text file, which will
- texto -p engfile | pgp -f be readable by pgp if the original message
- file was. Alternatively, the output from
- texto can be piped directly into pgp.
-
- texto -u engfile > uufile - Takes English text from engfile and produces
- OR a uuencoded file, which will be readable by
- texto -u engfile | uudecode uuencode if the original message file was.
- Alternatively, the output from texto can
- be piped directly into uudecode.
- NOTE that uudecoding the results will always
- produce a file called "texto.out" mode 644,
- unless you redirect texto's output into a
- file and hand edit that file.
-
- Installation:
- -------------
-
- This program has only been tested on IRIX 4.0.5, linux kernel 1.0.x,
- and Solaris 2.3. To build it, just type "make", on SGIs make it with the
- command "make sgi". If you're on a Solaris machine or any other machine
- whose uuencode uses spaces instead of ` characters, uncomment the
- "DEFINES" line in the makefile.
-
-
- Rolling your own:
- -----------------
-
- The usually-correct English sentence structures are found in the file
- "structs", which is basically a file of mad lib-type "fill in the blank"
- sentences. Feel free to add your own, just be really really careful about
- not using words in the "words" file. You're safe if you use words that
- you see elsewhere in the "structs" file. Using varying "structs" files
- could at least annoy mail scanners. Using different "words" files as
- well should totally defeat them.
-
- The 64 verbs, 64 adjectives, 64 adverbs, 64 places, and 64 things
- which are used to fill in the blanks are in the "words" file. Again, feel
- free to add your own, but again, be careful. Don't use words that end in
- "s" or "ing" (they'll get chopped), don't use words that are already in
- there (you can double check with the command "sort words | uniq -d"). The
- order of the words in each section of the file is also significant, so for
- example rearranging the nouns will change the result.
-
- If you use a modified "words" file, the person on the other end of
- your communication must of course be using the same one, or the transformation
- will fail miserably. The "structs" file is totally irrelevant however, and
- can be modified to suit your taste or literary style, so long as it doesn't
- conflict with the "words" file as mentioned above. The structs file is
- not used in "decoding" text, so two people can still communicate whether
- or not they have the same "structs" file.
-
- BUGS
- ----
-
- uuencoded files lose the mode and filename information, which is a bummer.
- Always writing to stdout may not be the best way to go.
- The text produced by texto'ing a uuencoded file can be _really_ repetitive.
- The 64-word dictionaries thing vs. the 4096-word ones, as mentioned above.
- Texto is a dorky name, but it sortof rhymes with stego.
- Please report any other bugs or fixes to kmaher@ucsd.edu
-
- LICENSE
- -------
-
- Copying, modifications, improvements, etc. are highly encouraged, just
- let me know so I can incorporate them.
-
- All rites reversed.
-
- AUTHOR
- ------
-
- Kevin Maher
- kmaher@ucsd.edu
- Underware Software Production Ltd. Inc. etc.
- "Covering your ass since 1981"
-
-